Selecting level-specific specialized vocabulary using statistical measures

ثبت نشده
چکیده

To find an easy-to-use, automated tool to identify technical vocabulary applicable to learners at various levels, nine statistical measures were applied to the 7.3-million-word ‘commerce and finance’ component of the British National Corpus. The resulting word lists showed that each statistical measure extracted a different level of specialized vocabulary as measured by word length, vocabulary level, U.S. native speaker grade level, and Japanese school textbook vocabulary coverage, and that these measures produced level-specific words; i.e., beginning-level basic business words were identified using Cosine and the complimentary similarity measure; intermediate-level business words were extracted using log-likelihood, the chi-square test, and the chi-square test with Yates’s correction; and advanced-level business word lists were created using mutual information and McNemar’s test. We conclude that these statistical measures are effective tools for identifying multi-level specialized vocabulary for pedagogical purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Level-Specific Science and Technology Vocabulary

With rapid advances in technology come rapid advances in the language of technology, or English for Science and Technology (EST). We have had success in our earlier research in devising a systematic means of extracting leveland domain-specific words from the British National Corpus. In this study, we apply a similar methodology to the Corpus of Professional English (CPE), a 20-million-word comp...

متن کامل

Speech Recognition Methods and their Potential for Dialogue Systems in Mobile Environments

The DaimlerChrysler speech recognizer is specialized for robust speech recognition in noisy environments, in particular for command and control applications. The recognizer that is used in cars has fixed grammars, which restrict the speaker to using short commands. This paper presents methods that allow the user to speak more freely and add spontaneous words to the commands: language modelling,...

متن کامل

A Suite to Compile and Analyze an LSP Corpus

This paper presents a series of tools for the extraction of specialized corpora from the web and its subsequent analysis mainly with statistical techniques. It is an integrated system of original as well as standard tools and has a modular conception that facilitates its re-integration on different systems. The first part of the paper describes the original techniques, which are devoted to the ...

متن کامل

The Specialized Vocabulary of Modern Patent Language: Semantic Associations in Patent Lexis

This paper presents an analysis of the language of patents, as a contribution to the field of English for Specific Purposes (ESP). While there work appears to fill a niche in the ESP field (and particularly in the English for Occupational Legal Purposes), the present study insists that statistical approach is necessary for compiling patent technical word list for ESP. Since research studies on ...

متن کامل

Measuring Similarity between Flamenco Rhythmic Patterns

Music similarity underlies a large part of a listener’s experience, as it relates to familiarity and associations between different pieces or parts. Rhythmic similarity has received scant research attention in comparison with other aspects of music similarity such as melody or harmony. Mathematical measures of rhythmic similarity have been proposed, but none of them has been compared to human j...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005